Code
library(needs)
needs(igraph,
igraphdata,
netseg)
data("UKfaculty")Theoretical Insights
Hier Theorie Beschreibung und die verschiedenen Ansätze
Text on all the different approaches to embeddedness and social capital etc.
Measures of reciprocity are based on the dyad census: Any pair of vertices in a network is called a dyad. There are three possible states of dyads (in a directed network): null dyads (no edge), b) asymmetric dyads (one directed edge) and c) mutual dyads (two directed edges).
null <- graph_from_literal(1, 2)
as <- graph_from_literal(1-+2)
mutual <- graph_from_literal(1-+2, 2-+1)
g <- graph_from_literal(1-+2, 2-+3, 2+-3, 5-+2, 3--+5, 5-+3, 3-+4)
layout_line <- cbind(1:2, rep(1, 2))
plot(null,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
plot(as,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
plot(mutual,
vertex.color = "lavender",
vertex.size = 30,
layout = layout_line)
# Plotte das Netzwerk mit Lavender vertices
plot(g,
vertex.color = "lavender",
vertex.size = 30,
edge.width = 2,
arrow.size = 0.1,
layout = layout_with_fr)| null | asymmetric | mutual |
|---|---|---|
| 5 | 3 | 2 |
The level of reciprocity shows whether a network mainly consists of mutual or asymetrical edges and can be calculated in R using either ‘default’-mode (number of reciprocated edges divided by the total number of edges) or ‘ratio’-mode (number of mutual dyads divided by the number of asymmetric dyads).
Any set of 3 vertices is called a triad. In directed networks, 16 different triads exist (Davis and Leinhardt 1967).
# Define all 16 triad types using adjacency matrices
triad_matrices <- list(
matrix(c(0,0,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 003
matrix(c(0,1,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 012
matrix(c(0,1,0, 1,0,0, 0,0,0), nrow=3, byrow=TRUE), # 102
matrix(c(0,1,1, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 021D
matrix(c(0,0,0, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 021U
matrix(c(0,1,0, 0,0,1, 0,0,0), nrow=3, byrow=TRUE), # 021C
matrix(c(0,1,0, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 111D
matrix(c(0,1,0, 1,0,1, 0,0,0), nrow=3, byrow=TRUE), # 111U
matrix(c(0,1,1, 0,0,0, 0,1,0), nrow=3, byrow=TRUE), # 030T
matrix(c(0,1,0, 0,0,1, 1,0,0), nrow=3, byrow=TRUE), # 030C
matrix(c(0,1,1, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 201
matrix(c(0,1,1, 0,0,1, 0,1,0), nrow=3, byrow=TRUE), # 120D
matrix(c(0,0,0, 1,0,1, 1,1,0), nrow=3, byrow=TRUE), # 120U
matrix(c(0,1,1, 0,0,1, 1,0,0), nrow=3, byrow=TRUE), # 120C
matrix(c(0,0,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE), # 210
matrix(c(0,1,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE) # 300 (mutual all)
)
triad_names <- c("003", "012", "102", "021D", "021U", "021C", "111D", "111U", "030T", "030C",
"201", "120D", "120U", "120C", "210", "300")
# Plot each triad
par(mfrow = c(4, 4), mar = c(1, 1, 2, 1)) # 4x4 grid layout
for (i in 1:16) {
adj <- triad_matrices[[i]]
g <- graph_from_adjacency_matrix(adj, mode = "directed")
plot(
g,
main = triad_names[i],
vertex.color = "lavender",
vertex.size = 30,
vertex.label = NA,
vertex.label.cex = 0,
edge.arrow.size = 0.3,
layout = layout_in_circle
)
}The codes like 102, 021C, 120U may feel cryptic at first, but they follow a logical pattern:
** Basic structure: Three-digit code
Example: ‘102’
| Digit | Meaning |
|---|---|
1 |
1 mutual (reciprocal) relationship (e.g., A ↔︎ B) |
0 |
0 one-way relationships |
2 |
2 node pairs have no relationship |
Add-On lettering:
For more complex cases, a letter is added to describe the triad’s shape or direction:
| Code | Description |
|---|---|
021D |
Two one-way edges going down (A → B → C) |
021U |
Two one-way edges going up (C → B → A) |
021C |
A circle of one-way edges (A → B → C → A) |
030T |
A transitive triangle (A → B → C and A → C) |
030C |
A cycle with three one-way edges (A → B → C → A) |
120D |
One mutual edge, two one-way edges in a downward direction |
120U |
One mutual edge, two one-way edges in an upward direction |
120C |
A circle-like pattern with mutual + one-way ties |
210 |
Two mutual dyads and one one-way edge |
300 |
All three dyads are mutual (fully connected triad) |
We can compute the triad census of a network using
In undirected networks only 4 distinct triads exist.
# Define the 4 undirected triad types
undirected_triad_matrices <- list(
matrix(c(0,0,0, 0,0,0, 0,0,0), nrow=3, byrow=TRUE), # 0 edges: Empty (Type: 003)
matrix(c(0,1,0, 1,0,0, 0,0,0), nrow=3, byrow=TRUE), # 1 edge: One dyad (Type: 102)
matrix(c(0,1,1, 1,0,0, 1,0,0), nrow=3, byrow=TRUE), # 2 edges: Open triad (Type: 201)
matrix(c(0,1,1, 1,0,1, 1,1,0), nrow=3, byrow=TRUE) # 3 edges: Complete triad (Type: 300)
)
undirected_triad_names <- c("003", "102", "201", "300")
# Plot each undirected triad
par(mfrow = c(2, 2), mar = c(1, 1, 2, 1)) # 2x2 grid layout
for (i in 1:4) {
adj <- undirected_triad_matrices[[i]]
graph <- graph_from_adjacency_matrix(adj, mode = "undirected")
plot(
graph,
main = undirected_triad_names[i],
vertex.color = "lavender",
vertex.label = NA,
vertex.size = 30,
vertex.label.cex = 0,
layout = layout_in_circle
)
}Segregation: network level property-Segregation refers to the extent to which actors and social groups are separated from another. Segregation exists if the share of ties between groups is much less pronounced than the share of in-group ties, or if the groups are fully separated from each other.
The literature does not agree regarding the exact terminology of homophily and segregation. To some authors, homophily refers to both the macro-phenomenon (lack of intergroup relationships) and the micro-mechanism (similarity preferences). To others, homophily refers to the macro-phenomenon and homophilic or homophilious selection refer to the micro-mechanism.
I concur with other researchers on the terminology of homophily (preferences) for the mechanism and segregation for the macro-pattern.
# Check the attributes available in the dataset
vertex_attr_names(UKfaculty)
# Assuming 'Group' is an attribute in the dataset
# Assign different colors to each unique group
group_colors <- rainbow(length(unique(V(UKfaculty)$Group)))
# Create a vector of colors for each vertex based on its group
vertex_colors <- group_colors[V(UKfaculty)$Group]
# Plot the graph with vertex colors based on the 'Group' attribute
plot(UKfaculty, vertex.color = vertex_colors)Denotes the social phenomenon that people tend to be similar to their social contacts.
Birds of a feather - Billie Eilish has the same title as one of the most cited homophily papers by McPherson, Smith-Lovin and Cook (2001), where various studies of homophily were summarised. Homophily is observed in regard to race and ethnicity, gender and age, religion, education, occupation and social class, behaviour, and attitudes and beliefs
Previous research has found evidence of homophily regarding:
Definition: from what stage on, can we say a network is homophile or segregated?
Other, secondary types of homophily can reinforce segregation on an attribute. Example: Ethnic segregation can be reinforced by socioeconomic homophily, when ethnic origin (node color) and socioeconomic status (node shape) overlap (e.g., many ethnic minority members have a lower socioeconomic status and many ethnic majority members have a higher socioeconomic status)
For an explanation see (Wimmer and Lewis 2010, 588–94)
Let \(e_{AB}\) be the number of edges between two types of actors \(A\) and \(B\) (e.g. different faculties) and \(m\) be the total numbers of edges in the network. Then homophily exists if \(\frac{e_{AB}{m}\) is significantly lower than the probability of a random connection between two actors from type \(A\) and type \(B\).
We can also use the assortavity coefficient for measurement.
Assortavity coefficients are basically correlation coefficients that (in the case of categorical variables) can be measured by \(e_{i,j}\) being the ratio of edge from actors of type \(i\) to type \(j\)
\[ r = \frac{\sum_{i}e_{ii}-\sum_{i}a_ib_i}{1-\sum_{i}a_ib_i} \]
or odds-ratios of within-group ties (Bojanowski and Corten 2014)
telling us, that same-group tie odds are 12.866 times greater than tie odds between groups.
or Colemans Index (Coleman 1958) that compares the proportion of same-group neighbours to the proportion of that group in the network as a whole.
which compares the proportion of same-group neighbors to the proportion of that group in the network as a whole. It is a number between -1 and 1. Value of 0 means these proportions are equal. Value of 1 means that all ties outgoing from a particular group are sent to the members of the same group. Value of -1 is the opposite – all ties are sent to members of other group(s).
Measuring homophily preferences is difficult as we cannot directly observe preferences. In the social networks literature, the most common ways to estimate the strength of homophily preferences are Exponential Random Graph Models (ERGMs) and Stochastic Actor-Oriented Models (SAOMs). These estimate the existence of ties in dependence of actor attributes (such as similarity). While we will only be talking about these models in week 07, I still want to point out the caveat of ERGMs and SAOMs that they rely on a revealed preference assumption and infer preferences from the observed network structure.
Another approach to testing homophily preferences are permutation tests. A permutation is the result of a random reshuffle of the row and column names in an adjacency matrix. This creates a random network with the same structural properties (e.g., reciprocated ties) as the observed network. Who is connected to whom is, however different now. The level of segregation in the permuted network is now solely based on the network composition. If we permute the network a large number of times, we receive a distribution of segregation values expected due to the network composition. Then, we can test whether the observed segregation is statistically significant from the distribution of permutations. A significant difference tells us that preferences are (likely) the reason for the difference between the mean of the permutations and the observation. We, however, do not know which preferences are responsible!
NPerm <- 100 # number of permutations
# create empty lists for the permuted matrices and the segregation measures
permlist <- as.list(1:NPerm)
seglist <- as.list(1:NPerm)
# set a random seed for replicability
set.seed(213)
for (perm in 1:NPerm) {# loop over all permutations
# pick the original matrix
mat <- friendmats_50[[1]]
# re-sample (without replacement!) the names
new_names <- sample(x = rownames(mat),
size = length(rownames(mat)),
replace = FALSE)
# reassing the names
rownames(mat) <- new_names
colnames(mat) <- new_names
# save the matrix
permlist[[perm]] <- mat
# create a graph
g_perm <- graph_from_adjacency_matrix(adjmatrix = mat,
mode = "directed")
# initialize smoking
V(g_perm)$smoking <- NA
for (ego in V(g_perm)$name) {
# re-assign the smoking behavior based on the node name
V(g_perm)$smoking[V(g_perm)$name == ego] <- dat_50$smoking[dat_50$idstud == ego]
}
# calculate the assortativity coefficient
seglist[[perm]] <- assortativity_nominal(graph = g_perm,
types = V(g_perm)$smoking + 1)
}
# unlist the permuted assortativity coefficients
assorts <- unlist(seglist)
# test whether the mean of the distributed permutativities is likely to be the observed assortativity
t.test(x = assorts,
mu = assortativity_nominal(g, types = V(g)$smoking + 1))Triadic closure is a concept in social network theory that was first introduced by Georg Simmel. It describes the tendency of two nodes \(A\) and \(B\) to become connected if they share a common neighbor \(C\). It can be used to understand and predict the formation of social ties in network (of course, there are many other factors influencing tie formation).
The two most common measures of triadic closure for a graph are clustering coefficient and transitivity.
A clustering coefficient is a measure of the degree to which vertices in a graph tend to cluster together. Evidence suggests that in most social networks, groups tend to create highly clustered subgroups.
Global clustering coefficient or transitivity:
In an undirected graph a connected triple is a path of length 2 (i.e. triad 201). A triangle is a cycle of exactly three nodes (i.e. triad 300). The global clustering coefficient is then defined as
\[ G_g = \frac{(\text{number of triangles} \times 3)}{(\text{number of connected triples})} \]
Local clustering coefficient: The local clustering coefficient gives an indication of the of the extend of clustering of a single vertex (How close its neighbours are to being a clique (complete graph).).
Let \(G = (V, E)\) be an undirected simple graph1 with \(V\) vertices and \(E\) edges. \(n\) is the total number of vertices in the graph, \(m\) the total number of edges.
The neighbourhood \(N_i\) of a vertex \(v_i\) are its immediatle neighbours and defined as followed:
\[ N_i = \{v_j : e_{ij} \in E \wedge e_{ji} \in E\} \]
Whe define \(k_i\) as he number of vertives \(|N_i|\) in the neighbourhood of \(v_i\). The local clustering coefficient of a vertex \(v_i\) is the proportion of edges between the vertices within its neighbourhood divided by the number of edges that could possibly exist between them.
Thus we define the local clustering coefficient of a vertex \(v_i\) as:
\[ C_i = \frac{\{|e_{jk} : v_j, v_k \in N_i, e_{jk} \in E|\}}{k_i (k_1-1)} \]
where \(e_{ij}\) is the number of edges between the vertices in the neighbourhood of \(v_i\) and \(k_i\) is the number of vertices in the neighbourhood of \(v_i\).
Graph from ww2
Aufgaben:
Laden Sie den Datensatz middleschool aus dem Paket UserNetR! Denken Sie an die Konventierung des Netzwerks zu einem igraph-Objekt (package intergraph)! Lassen Sie sich die Anzahl der Knoten und Kanten ausgeben! Ist dieses Netzwerk gerichtet? Berechnen Sie den Dyadenzensus, die Reziprozität, den Triadenzensus und die Transitivität (Clustering-Koeffizient)! Welche Unterschiede lassen sich bezüglich dieser Maßzahlen im Vergleich zum AIDS-Blog-Netzwerk festellen? Verwenden sie einen (oder zwei) data.frame, um die beiden Netzwerke zu vergleichen! Denken Sie daran, auf die Unterschiede in der Größe zu korrigieren (das AIDS-Blog-Netzwerk is viermal so groß, wie das Middleschool-Netzwerk)! Was könnten die Ursachen für die Unterschiede sein?
Similarly to Homophily, Social Capital is not such a crisply defined term as it may seem. We can roughly differentiate two dimensions along which major definitions of social capital can be categorized: resources vs. structure, and a focus on the individual vs the group. (Lin 2001) provides a good overview of the general concept.
This mainly dates back to (Coleman 1958) who proposed that the social capital should be operationalized as the cohesion of a network or a subgroup. He claimed that cohesion produces trust, predictability and enables cooperation. This draws on Rational Choice Theory and proposes stable, cohesive groups with visibility of each others behavior as a prevention of defection. See network cohesion measures for descriptives.
While addressing a similar problem as Coleman, Robert Putnam (Putnam 2001) claims that groups (and indirectly the individuals in them) need civic engagement. As such, this approach focuses less on the structural properties of the network and more on what people do/ bring into the network. Hence the label “resource-based approach”. A simple way to do this is to calculate average levels of, e.g., voter turnout or participation in civic organizations per group or sub-group.
Aufbau
reciprocity triads triadic closure (global clustering coefficient) balance theory
Social capital through embeddedness of acors
Local clustering coefficient Strength of weak ties structural holes
Social capital on the group level
A simple graph is a graph that does not contain multiple edges or loops.↩︎
Social capital on the individual level
Structural approach
Vertex importance
Vertex importance
The structural approach on the individual level relates mainly to the works of Mark Granovetter (1973) and Ronald Burt (2009).
But before we dive into there theoretical work, the vertices of a graph can provide rich information about a network, its structure and its dynamics. In sociology and psychology contexts, this is particularly true, because more often than not vertices represent people. The fact that people play different roles and have different influences inside groups and communities has motivated centuries of sociological and psychological research, so it is unsurprising that the concept of vertex importance and influence is of great interest in the study of people or organizational networks.
It follows from much of the earlier work we have been doing in this book that the vertices of a graph can provide rich information about a network, its structure and its dynamics. In sociology and psychology contexts, this is particularly true, because more often than not vertices represent people. The fact that people play different roles and have different influences inside groups and communities has motivated centuries of sociological and psychological research, so it is unsurprising that the concept of vertex importance and influence is of great interest in the study of people or organizational networks.
But importance and influence are not precisely defined concepts, and to make them real within the context of graphs and networks we need to find some sort of mathematical definition for them. In many visual graph layouts, more important or influential vertices that have stronger roles in overall connectivity will usually be positioned toward the center of a group of other vertices. Intuitively therefore, we use the term ‘centrality’ to describe the importance or influence of a vertex in the connected structure of a graph.
Code
IMAGE OF A GRAPH where each centrality has a different node with different measures
Degree centrality:
The degree centralty (or valence) of a vertex \(v\) is the number of edges connected to \(v\). Its thus a measure of immediate connections. Related to the concept of degree is the ego size. The \(n\)th order ego network of a given vertex \(v\) is the set including \(v\) itself and all vertices that are reachable from \(v\) by a path of length \(n\). The size of the \(n\) th order ego network is the number of vertices in it.
Out-degree: the number of edges going out from a vertex.
In-degree: the number of edges going into a vertex.
PLot of a small network with in degree as vertex label and size
Code
Closeness centrality:
The closeness centrality of a vertec \(v\) in a connected graph is the inverse of the sum of the distances from \(v\) to all other vertices.
Betweenness centrality
Eigenvektor centrality:
The Eigenvector centrality (or prestige) is a measure of how connected a vertex is to other influential vertices in the graph. It is impossible to define this without a little linear algebra.
Which node in the network has the highest
Aufgabe: Why your friends have more friends than you do (Feld 1991)
The strength of weak ties
Structural holes
Remember last sessions excurse on cutpoints and bridges?
Resource based aproach
This approach is closely linked to the name of Pierre Bourdieu 1982. He proposed that success depended on the level of capital individuals can mobilize. While Bourdieu remains a bit vage about what he means, Lin 2001 has some clearer definitions. But basically, the concept boils down to the access to resources through the social network. In the most simple terms we could operationalize this as the resources possessed by the (direct) social ties of ego.